A Bayesian Approach to Estimating the Selectivity of Conjunctive Predicates
نویسندگان
چکیده
Cost-based optimizers in relational databases make use of data statistics to estimate intermediate result cardinalities. Those cardinalities are needed to estimate access plan costs in order to choose the cheapest plan for executing a query. Since statistics are usually collected on single attributes only, the optimizer can not directly estimate result cardinalities of conjunctive predicates over multiple attributes. To avoid having to fall back to assuming statistical independence, modern relational database systems offer the possibility to additionally collect joint statistics over multiple attributes. These statistics allow a direct cardinality estimate for conjunctive predicates. A widely used approach is collecting the number of distinct value combinations as a joint statistic. This can be used for a uniformity based estimate, which assumes each value combination to occur equally often. Although this estimate is likely an improvement, it is still inaccurate, since “real world” data is unlikely to be uniform. This paper proposes a new approach of estimating the result cardinality of conjunctive predicates over multiple attributes of a relation. The proposed method combines knowledge from single-column histograms using a conditional probability based “uniform correlation” approach. Initial evaluation shows that this method yields better results for estimating predicates on highly correlated attributes than the classic uniformity based approach.
منابع مشابه
Consistently Estimating the Selectivity of Conjuncts of Predicates
Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when comparing alternative query execution plans. To this end, advanced optimizers use multivariate statistics (MVS) to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the resulting use ...
متن کاملSimulating and Optimizing the Conjunctive Use of Surface and Groundwater Resources Using the System Dynamics Approach (A Case Study: Dashte-Abbas Irrigation Network)
The construction of irrigation network and the water transfer from Karkheh Dam to Dashte-Abbas, due to neglecting the groundwater resources has increased groundwater level and waterlogging of the agricultural land in the recent years. The aim of this study was, therefore, to optimize the conjunctive use of surface and groundwater resources in Dashte-Abbas to minimize waterlogging problems and a...
متن کاملHASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Predicates
Current methods for selectivity estimation fall into two broad categories, synopsis-based and sampling-based. Synopsis-based methods, such as histograms, incur minimal overhead at query optimization time and thus are widely used in commercial database systems. Samplingbased methods are more suited for ad-hoc queries, but often involve high I/O cost because of random access to the underlying dat...
متن کاملEstimating Steatosis Prevalence in Overweight and Obese Children: Comparison of Bayesian Small Area and Direct Methods
Background Often, there is no access to sufficient sample size to estimate the prevalence using the method of direct estimator in all areas. The aim of this study was to compare small area’s Bayesian method and direct method in estimating the prevalence of steatosis in obese and overweight children. Materials and Methods: In this cross-sectional study, was conducted on 150 overweight and obese ...
متن کاملEstimating E-Bayesian of Parameters of two parameter Exponential Distribution
In this study, E-Bayesian of parameters of two parameter exponential distribution under squared error loss function is obtained. The estimated and the efficiency of the proposed method has been compared with Bayesian estimator using Monte Carlo simulation.
متن کامل